624 research outputs found
Feasible Policy Iteration
Safe reinforcement learning (RL) aims to solve an optimal control problem
under safety constraints. Existing safe RL methods use the
original constraint throughout the learning process. They either lack
theoretical guarantees of the policy during iteration or suffer from
infeasibility problems. To address this issue, we propose an
safe RL method called feasible policy iteration (FPI) that
iteratively uses the feasible region of the last policy to constrain the
current policy. The feasible region is represented by a feasibility function
called constraint decay function (CDF). The core of FPI is a region-wise policy
update rule called feasible policy improvement, which maximizes the return
under the constraint of the CDF inside the feasible region and minimizes the
CDF outside the feasible region. This update rule is always feasible and
ensures that the feasible region monotonically expands and the state-value
function monotonically increases inside the feasible region. Using the feasible
Bellman equation, we prove that FPI converges to the maximum feasible region
and the optimal state-value function. Experiments on classic control tasks and
Safety Gym show that our algorithms achieve lower constraint violations and
comparable or higher performance than the baselines
Willingness to pay for climate change mitigation:evidence from China
China has become the largest emitter of carbon dioxide in the world. However, the Chinese public's willingness to pay (WTP) for climate change mitigation is, at best, under-researched. This study draws upon a large national survey of Chinese public cognition and attitude towards climate change and analyzes the determinants of consumers' WTP for energy-efficient and environment-friendly products. Eighty-five percent of respondents indicate that they are willing to pay at least 10 percent more than the market price for these products. The econometric analysis indicates that income, education, age and gender, as well as public awareness and concerns about climate change are significant factors influencing WTP. Respondents who are more knowledgeable and more concerned about the adverse effect of climate change show higher WTP. In comparison, income elasticity is small. The results are robust to different model specifications and estimation techniques. © 2016 by the IAEE. All rights reserved
- …